Detailed Description
The technical solutions of the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which are obtained by a person skilled in the art based on the embodiments of the present application, fall within the scope of protection of the present application.
Some terms and nouns involved in embodiments of the application are explained below.
1) The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or otherwise described herein, and that the objects identified by "first," "second," etc. are generally of a type not limited to the number of objects, for example, the first object may be one or more. In addition, "and/or" in the specification means at least one of the connected objects, and the character "/", generally means a relationship in which the associated objects are one kind of "or".
2) The terms "at least one," "at least one," and the like in the description and in the claims, mean that they encompass any one, any two, or a combination of two or more of the objects. For example, at least one of a, b, c (item) may represent "a", "b", "c", "a and b", "a and c", "b and c" and "a, b and c", wherein a, b, c may be single or plural. Similarly, the term "at least two" means two or more, and the meaning of the expression is similar to the term "at least one".
The marks in the application are words, symbols, images and the like used for indicating information, and marks or other containers can be used as carriers for displaying information, including but not limited to word marks, image marks, symbol marks and the like.
It should be noted that, in the operation execution method provided by the embodiment of the present application, the execution body may be an electronic device such as a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted electronic device, and the like. In some embodiments of the present application, an electronic device is taken as an execution body to execute an operation execution method, which is described in the embodiments of the present application.
The operation execution method and the device provided by the embodiment of the application are described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.
It can be understood that the operation execution method provided by the embodiment of the present application can be applied to any of the following scenarios:
1. providing automatic operation based on voice-control mapping for visually impaired users;
2. Automatic test, namely replacing manual application compatibility test script generation;
3. the internet of things (Internet of Things, ioT) device controls the intelligent operation of the appliance control application through the handset.
In the non-obstacle auxiliary scene, when the user needs the electronic equipment to automatically execute a single operation instruction, taking the input text information or voice information as an example of opening the shopping application A, the shopping application A is opened, and the user can input the shopping application opening in an intelligent assistant of the electronic equipment. And the electronic equipment obtains an operation instruction for displaying the shopping application interface by analyzing the text information input by the user. At this time, the electronic device automatically runs the shopping application a, and displays the interface of the shopping application a. It will be appreciated that the "input in the intelligent assistant" may be a manual input or a voice input. Generally, the above scheme can be applied to a scene of voice input.
However, in the case of the barrier-free assistance, when the user needs the electronic device to automatically execute the continuous operation instruction, taking the input of a text message or a voice message as the opening of the shopping application a, searching for the bluetooth headset and adding the shopping cart as an example, the following two schemes may be adopted:
the scheme 1 is that a user needs to manually operate the electronic equipment to trigger the electronic equipment to execute corresponding operation according to input.
For example, the user clicks an icon of the shopping application a to trigger the electronic device to display a search interface of the shopping application a, then the user inputs a bluetooth headset in a search box in the search interface to trigger the electronic device to display a commodity list interface of the bluetooth headset, and finally one bluetooth headset is selected through the user clicking input to trigger the electronic device to add the bluetooth headset into the shopping cart.
Therefore, the user needs to input the operation instruction for many times, and the user is still required to continuously perform manual operation, so that the automatic execution of the operation instruction cannot be truly realized, and the process is too complicated.
The scheme 2 is that the electronic equipment adopts barrier-free service to prerecord operation scripts, namely fixed operation video of a certain instruction.
For example, the user records the operation script of opening shopping application a, searching for bluetooth headset and joining shopping cart, i.e. the user inputs "bluetooth headset" in the search box according to clicking shopping application a, and operates the complete flow of joining commodity into shopping cart and records the position and operation mode of touching the screen of electronic device by the user. Then, when the electronic device receives the shopping application A which is input by the user and searches the Bluetooth headset and adds the shopping cart instruction, the electronic device automatically executes the input according to the input position and the input mode in the operation script recorded before, so as to achieve the effect of automatically executing the input instruction.
Therefore, after the operation script is recorded, if the control position in the interface of the shopping application A is changed, the original input position is caused, and the control which needs to be triggered is not available, so that the operation instruction cannot be executed, and the operation is interrupted.
Thus, existing schemes have difficulty in achieving a dynamic intent understanding of the compatibility process with interface changes.
In the operation execution method provided by the embodiment of the application, the electronic equipment receives first information input by a user, the first information comprises at least one of voice information or text information, semantic analysis is performed on the first information in response to the first information to obtain an operation instruction sequence corresponding to the first information, DCDL data of an operation object indicated by each operation instruction in the operation instruction sequence is obtained based on the operation instruction sequence, and the operation instructions in the operation instruction sequence are sequentially executed based on the DCDL data. In the scheme, the user triggers the electronic device to directly perform semantic analysis on the continuous operation instruction by inputting text information or voice information containing the continuous operation instruction, so that DCDL data of an operation object indicated by each operation instruction in the continuous operation instruction is obtained, and the electronic device can automatically execute the corresponding operation instruction according to the DCDL data of the operation object indicated by each operation instruction in turn. Thus, the steps of man-machine interaction are saved by automatically executing the continuous operation instructions input by the user.
The execution body of the operation execution method provided by the embodiment of the application can be an operation execution device. The operation performing means may be, for example, an electronic device or a component in the electronic device, such as an integrated circuit or a chip. An operation execution method provided by the embodiment of the present application will be exemplarily described below using an electronic device as an example.
An embodiment of the present application provides an operation execution method, and fig. 1 shows a flowchart of an operation execution method provided by an embodiment of the present application, where the method may be applied to an electronic device. As shown in fig. 1, the method for performing an operation according to an embodiment of the present application may include the following steps 201 to 204.
Step 201, the electronic device receives first information input by a user.
In some embodiments of the present application, the first information includes at least one of voice information or text information.
In some embodiments of the present application, the electronic device may obtain the voice information input by the user by receiving the voice input of the user.
For example, the electronic device may record the voice of the user in the smart phone assistant to obtain voice information input by the user.
In some embodiments of the present application, the electronic device may obtain text information input by the user by receiving touch input of the user.
For example, the electronic device may receive a touch input from a user in a smartphone assistant, and display text information entered by the user.
In some embodiments of the present application, the first information is used to characterize an operation intention of the user.
In some embodiments of the present application, the electronic device may obtain the operation intention of the user by parsing the first information.
For example, when the first information input by the user is "i want to view the bluetooth headset in the shopping application a", the user operation analyzed by the electronic device is intended to "run the shopping application a, search the bluetooth headset in the search page, and display the commodity list interface of the bluetooth headset".
Step 202, the electronic device responds to the first information, performs semantic analysis on the first information, and obtains an operation instruction sequence corresponding to the first information.
In some embodiments of the present application, the sequence of operation instructions is a continuous operation instruction parsed to characterize the execution of the electronic device required by the user.
In some embodiments of the present application, the sequence of operation instructions includes at least one operation instruction.
In some embodiments of the present application, the sequence of operation instructions may be obtained based on text input by the user, may be obtained based on speech input by the user, or may be obtained in other forms.
In some embodiments of the present application, the electronic device may obtain a piece of text information by receiving text input from a user, or voice input. The electronic device then inputs the text information into a large language model, and refines the key text, such as keywords, by the prompt words (prompt) of the model. And then, further performing task decomposition on the key text through the model, and outputting an operation instruction sequence. It should be noted that the above sequence of operation instructions may be understood as a structured intention for characterizing the operation intention of the user.
It will be appreciated that the sequence of operating instructions described above is a set of sequential operating instructions.
Illustratively, the alert word templates may be:
"you are an application operation assistant, good at summarizing and refining keywords related to application operation, removing words for language gases, adjectives, adverbs and the like in the text and redundant words irrelevant to application operation or interface description and the like, and can refer to the following keywords to be extracted and the redundant words to be removed.
Keyword dictionary: [ keyword A, keyword B, keyword C. ]
Redundant word dictionary [ forehead, o, trouble, you'll.. The term "is used to indicate that the word is" a word "in the word dictionary
The user enters text [ text content ].
Outputting instruction text [ instruction text ].
By way of illustration, taking the example of "I want to search for Bluetooth headset in shopping application A" where the electronic device receives user voice input, the electronic device first converts the user input voice into text information "I want to search for Bluetooth headset in shopping application A", and then extracts the key text by inputting the text information into the prompt word template of the large language model.
Specifically, the specific contents after inputting text information into the prompt word template are as follows:
"you are an application operation assistant, good at summarizing and refining keywords related to application operation, removing words for language gases, adjectives, adverbs and the like in the text and redundant words irrelevant to application operation or interface description and the like, and can refer to the following keywords to be extracted and the redundant words to be removed.
Keyword dictionary: [ keyword A, keyword B, keyword C. ]
Redundant word dictionary [ forehead, o, trouble, you'll.. The term "is used to indicate that the word is" a word "in the word dictionary
The user enters text [ i want to search for bluetooth headset in shopping application a ].
Outputting instruction text [ running shopping application A searching Bluetooth headset ]. "
And then, the instruction text acquired through the prompt word is used for 'running the shopping application A to search the Bluetooth headset', the large language model is used for carrying out task decomposition, and finally an operation instruction sequence is obtained, namely, the shopping application A is run, a search box is positioned, the Bluetooth headset is input, and the search is triggered.
In some embodiments of the present application, the electronic device performs semantic analysis on each operation instruction in the operation instruction sequence, and obtains a semantic analysis result corresponding to each operation instruction.
In some embodiments of the present application, the semantic analysis result is used to characterize information such as execution behavior, operation objects, and the like of the corresponding operation instruction.
In some embodiments of the present application, the semantic analysis results include, but are not limited to, semantic keywords, text features.
In some embodiments of the present application, the electronic device may perform semantic analysis on each operation instruction through a model or a semantic analysis technique, to obtain a semantic analysis result in each operation instruction.
The keywords may be semantic analysis results included in the operation instruction, or may be semantic keywords that are expected according to the operation instruction.
For example, in the case where the operation instruction is "open shopping application a", the semantic analysis result may be shopping application a, a commodity, a search, or the like.
Step 203, the electronic device obtains DCDL data of the operation object indicated by each operation instruction in the operation instruction sequence based on the operation instruction sequence.
In some embodiments of the application, the dynamic control description language (Dynamic Control Description Language, DCDL) described above, which is a structured data format for standardized definition of controls in an application interface, is a domain-specific language for dynamically defining, configuring, and managing user interface controls or system component behaviors. It is typically applied to scenarios requiring dynamically generated interfaces, flexible configuration of control properties, or real-time adjustment of interaction logic, such as low-code platforms, industrial automation controls, game UI systems, or complex enterprise-level applications.
In some embodiments of the present application, the core characteristics and uses of the DCDL include the following two types:
1. dynamic control generation
The types of the controls, such as buttons, input boxes, or charts, the layout positions of the controls, the styles of the controls, and the initial states of the controls are described through declarative grammar, and the interface can be dynamically rendered at the runtime without hard coding.
2. Event driven logical binding
And supporting the association of events defining the control, such as clicking, input change and back-end logic, such as interface calling and data processing, and realizing the dynamic configuration of the interaction logic.
In some embodiments of the present application, the DCDL data is used to characterize relevant operation information of the operation object.
For example, in the case where the operation object is an application, the operation instruction needs to operate the application icon, such as clicking, long pressing, or dragging.
For example, when the operation instruction sequence is that the shopping application A is opened, a search box is positioned, a Bluetooth headset is input, and searching is triggered, the operation objects are respectively an application icon of the shopping application A, a search box of a search interface, a text input area and a search control.
In some embodiments of the present application, the related operation information of the operation object, that is, the DCDL data, includes, but is not limited to, semantic description information of the operation object, coordinates of the operation object, operation type of the operation object, name of the operation object, attribute of the operation object, and the like.
Illustratively, the DCDL data may be understood as semantic description information of different operation objects through natural language.
In some embodiments of the present application, the DCDL data includes DCDL data in an application identification layer, DCDL data in an interface description layer, and DCDL data in a control description layer.
It is understood that the DCDL data may be DCDL data of an application, DCDL data of an interface, or DCDL data of a control.
In some embodiments of the present application, the DCDL data may be stored in a DCDL database.
In some embodiments of the present application, the DCDL database may include an application identification layer, an interface description layer, and a control description layer. Wherein each data layer contains DCDL data for characterizing relevant operation information of an operation object.
In some embodiments of the present application, the DCDL data in the application identification layer includes, but is not limited to, an application package name (app), an application description (app_description), and a screen resolution (resolution).
In some embodiments of the present application, the application package name is used to implement a unique identifier of the target application.
Illustratively, the application package name may be a standard package name, such as com.
In some embodiments of the application, the above application description provides semantic assistance for the electronic device to acquire DCDL data of the application.
Illustratively, the application description is described as describing the application function attribute through natural language, if the a application is a shopping application, then the application of the a application is described as a shopping application.
In some embodiments of the application, the screen resolution is used as a reference for coordinate scaling.
For example, the screen resolution may be a screen resolution of an electronic device display screen, such as 1080×2400, when a certain application generates DCDL data.
Therefore, the electronic equipment ensures the uniqueness of the target application indicated by the operation instruction by adopting the double positioning mechanism of the application package name and the application description, and provides semantic auxiliary information for model decision, so that the electronic equipment can accurately determine the operation object according to the operation instruction. And, the electronic device determines the coordinate position of the operation object by referring to the screen resolution in the application identification layer, thereby solving the problem of operation offset due to the screen size difference.
In some embodiments of the present application, the DCDL data in the interface description layer includes, but is not limited to, an interface identifier (activity_id), an interface name (activity), an interface semantic description (description), and a semantic keyword (DCDL _keywords).
In some embodiments of the present application, the interface identifier is a system-level unique identifier, and is used to track interface version changes.
Illustratively, the interface identifier may be an interface version number, such as actigy 1.3.2.
In some embodiments of the present application, the electronic device obtains the interface version of the current interface by determining the interface identifier, if the interface identifier of the current interface is consistent with the identifier recorded in the DCDL data, it indicates that the current interface is not updated, and if the interface identifier of the current interface is inconsistent with the identifier recorded in the DCDL data, it indicates that the current interface is updated, and because the layout and the control of the current interface may change, the electronic device needs to re-obtain the DCDL data of the current interface.
In some embodiments of the present application, the interface name may be a class name of the interface, such as a home page, an intermediate interface (MAINACTIVITY), a search interface, and so on.
In some embodiments of the present application, the above-described interface semantic description is a description of the interface function through natural language. Such as an item search interface, a payment interface, an item detail page, and the like.
In some embodiments of the present application, the above-mentioned interface semantic description may be combined with an application package name of the application identification layer, an interface identifier, and a description of an interface function by natural language to construct a three-dimensional identification system, so that the electronic device may accurately obtain other DCDL data by detecting one of them.
Thus, the interface description layer constructs a three-dimensional identification system of the interface space through the association mapping of the interface identifier, the interface name and the interface semantic description, and realizes the compatibility identification of cross-version interface change.
In some embodiments of the present application, the control description layer is used to characterize the position of the control, and the fields related to the function and the operation logic.
In some embodiments of the present application, the DCDL data in the control description layer includes, but is not limited to, a control name (ID), a control type (type), a display text (text), control coordinate (bounds) information, an operation type (action), a verification condition (verification), and an alternate positioning strategy (alternate_ selectors).
In some embodiments of the application, the control name is a unique identifier of the control, such as a search control (search_bar), for logical reference.
In some embodiments of the present application, the control semantic description is a description of a control function through natural language, such as a commodity search input box.
In some embodiments of the application, the control types described above are used to characterize control functionality.
Illustratively, the control types include, but are not limited to, text entry area (EditText), radio options set (RadioButton), buttons (Button), vertical scroll list (ListView), picture display (IMAGEVIEW), progress indicator (ProgressBar).
In some embodiments of the application, the above-described operation types are used to define execution actions of the control.
Illustratively, the above operation types include, but are not limited to, entering text, hooking operation, click-to-instant operation, double click-to-instant operation, longitudinal sliding, long press operation, and no operation.
It is understood that one control type corresponds to at least one operation type.
For example, in the case where the control type is a text input area, such as a search box, the operation type is input text, i.e., the electronic device automatically inputs search text in the search box. And under the condition that the control type is a single selection option group, such as a gender selection group, the operation type is a selection operation, namely the electronic equipment executes the selection operation in one gender selection control. In the case that the control type is a button, such as a search button, the operation type is click triggering instant operation, that is, the electronic device triggers search operation by clicking the search button.
In some embodiments of the present application, the display text is text content currently displayed by the control, such as "search for merchandise.
In some embodiments of the present application, the control coordinate information is used to indicate a position of the control in the interface.
For example, the control coordinate information may be an absolute coordinate converted by the pixel value array, for example [ x1, y1], so as to implement cross-device adaptation in cooperation with a dynamic coordinate conversion algorithm, that is, a reference coordinate obtained through screen resolution, and the position of the control is found in the reference coordinate based on the absolute coordinate.
In some embodiments of the present application, the backup positioning strategy is a strategy for enabling a multipath positioning scheme when control coordinate positioning fails.
In some embodiments of the present application, the alternate positioning strategy includes, but is not limited to, hierarchical path positioning, alternate text identification.
In some embodiments of the application, the hierarchical path positioning is to obtain the positioning of the control through a storage path of data of the control in the system.
For example, the positioning path may be// android.
In some embodiments of the present application, the alternate text labels are text labels of the controls in the search interface that capture the positions of the controls.
It will be appreciated that the electronic device will employ the alternate text identification scheme to locate the control location only when the control fails.
In some embodiments of the application, the above-described validation conditions are used to define the expected outcome after the operation is performed.
The expected result may be that the execution result of the operation instruction is correct, or that the execution result of the operation instruction is incorrect, for example.
In some embodiments of the application, the validation conditions described above include two validation dimensions, a point-of-interface-jump identifier (post-condition) and a visual-feature-tag (visual-markers).
In some embodiments of the application, the interface jump identifier is used to describe the interface characteristics of the intended jump.
For example, in the case that the interface jump mark is "shopping cart interface (CARTACTIVITY) appears", when the electronic device detects the shopping cart interface, the execution result of the operation instruction of "joining shopping cart" is considered to be correct. Or when the electronic equipment does not detect the shopping cart interface, the execution result of the operation instruction of 'joining the shopping cart' is considered to be wrong.
In some embodiments of the application, the visual characteristic marks are used to mark visual elements that need to be verified.
For example, the visual feature mark is a highlighted shopping cart icon mark, and when the electronic device detects that the shopping cart icon exists in the interface, the shopping cart icon is highlighted, so that the electronic device considers that the execution result of the operation instruction of 'joining the shopping cart' is correct after recognizing the highlighted shopping cart icon. Or when the electronic equipment detects that the shopping cart icon does not exist in the interface, the electronic equipment does not recognize the highlighted shopping cart icon, and the execution result of the operation instruction of 'joining the shopping cart' is considered to be wrong.
In this way, the electronic device can understand the functional intent of the controls, such as the input role of a "search box," by establishing a semantic map of the unique identifier of each control with the natural language description. And the executable actions such as clicking, inputting and sliding of the controls are obtained by obtaining the operation type of each control, and the corresponding operation instructions are executed by associating preset operation logic, such as the automatic triggering of the virtual keyboard call by the input actions.
It can be understood that the DCDL data is a structural semantic description system provided based on DCDL technology, and the idea is that an application, an application interface and a control are abstracted into a computable and inferable semantic data structure, and the dynamic perception and execution of cross-interface operation are realized through hierarchical definition. Interface data structure standardization of cross-version application is realized for providing the electronic equipment with automatic operation. In other words, the electronic device stores the DCDL data of each application, the interface of the application and the control in the interface in the DCDL data in the form of description of natural language, so that the electronic device can be matched with the semantic description information thereof through different semantic analysis results to obtain the operation information of the corresponding operation object, such as the data of the application, the interface of the application and the display position, the characteristics, the name, the operation type and the like of the control in the interface.
In some embodiments of the present application, the DCDL data may be DCDL data stored in a DCDL database, or may be DCDL data generated in real time.
For static interfaces, for example, the electronic device may store DCDL data for all elements in the static interface in the DCDL database in advance.
For a dynamic interface, the electronic device may parse each element in real time according to the elements in the dynamically generated interface and generate DCDL data for each element.
It is understood that the above elements are any element in the interface, such as a control, a display area, a logo, etc. And the operation object may be any one of the elements.
In some embodiments of the present application, the electronic device obtains DCDL data matching with a semantic analysis result of an operation instruction according to the semantic analysis result.
In some embodiments of the present application, the electronic device may obtain DCDL data matching with a semantic analysis result from DCDL data stored in a DCDL database according to the semantic analysis result of an operation instruction, or obtain DCDL data matching with the semantic analysis result from DCDL data generated in real time.
In some embodiments of the present application, the electronic device may match semantic description information in DCDL data according to a semantic analysis result corresponding to an operation instruction, and determine DCDL data of an operation object indicated by the operation instruction as DCDL data of successfully matched DCDL data.
In some embodiments of the present application, the electronic device may match semantic description information in DCDL data stored in the DCDL database according to a semantic analysis result corresponding to an operation instruction, and if DCDL data matching the semantic analysis result exists, directly use the DCDL data as DCDL data of an operation object indicated by the operation instruction. If no DCDL data in the DCDL data stored in the DCDL database is matched with the semantic analysis result, the electronic equipment analyzes all elements of the current interface to generate DCDL data of all elements, matches the semantic analysis result of the operation instruction with the generated DCDL data, and uses the DCDL data successfully matched as the DCDL data of the operation object indicated by the operation instruction.
For example, the electronic device may match semantic keywords in the semantic analysis result of the operation instruction with semantic description information in the DCDL data.
For example, when the operation instruction is "open shopping application a", the semantic analysis result includes shopping application a, and at this time, the electronic device retrieves the semantic description information "shopping application a" from the DCDL data, and uses the corresponding "icon coordinates of shopping application a" and the DCDL data of "operation type of shopping application a" as the DCDL data of the operation instruction, so that the electronic device can find the location of shopping application a, and operate the shopping application a according to the operation type "click" to trigger the electronic device to display the display interface of shopping application a.
When the semantic analysis of the operation instruction input by the user records the application name of a certain shopping application, the electronic device can specify the shopping application and the application name as semantic keywords of the operation instruction at the same time, and match the semantic keywords of the shopping application and the shopping application a in the DCDL data, at this time, the electronic device can match the DCDL data of the shopping application a in the DCDL database, and at this time, the electronic device acquires the DCDL data to include not only the DCDL data in the application identification layer of the application but also information of other layers under the application, for example, interface information of the shopping application or control information in the interface of the shopping application.
In some embodiments of the present application, when the electronic device acquires DCDL data of an operation object indicated by an operation instruction, the electronic device may package the acquired DCDL data into a data packet. I.e. one data packet may contain DCDL information of not less than one data layer. For example, a data packet may include data of an application identification layer of the application a, and data of an interface description layer of the application a, such as an interface layout, a control position, and the like.
In some embodiments of the present application, the electronic device encapsulates DCDL data of an operation object indicated by an operation instruction into an independent object, and splices the operation instruction, so that the electronic device directly analyzes the DCDL data in the operation instruction and performs a corresponding operation according to the DCDL data obtained by the analysis under the condition that the electronic device encounters the same operation instruction again, thereby making the electronic device unnecessary to query the DCDL database for a second time, and improving the execution efficiency of automatically executing continuous operations.
Step 204, the electronic device sequentially executes the operation instructions in the operation instruction sequence based on the DCDL data.
In some embodiments of the present application, the electronic device executes the corresponding operation instruction based on DCDL data of the operation object indicated by the one operation instruction.
In some embodiments of the present application, the electronic device sequentially executes the corresponding operation instructions in the operation instruction sequence based on the DCDL data of the operation object indicated by each operation instruction.
In some embodiments of the present application, the electronic device sequentially executes each of the operation instructions in the order of the operation instructions in the operation instruction sequence.
In some embodiments of the present application, the electronic device executes the operation instruction according to DCDL data representing operation information of the operation object.
In some embodiments of the present application, the operation information includes, but is not limited to, coordinates of the operation object, operation type of the operation object, resolution of the operation object, and screen resolution.
In some embodiments of the present application, the coordinates of the operation object are used to indicate a location where the electronic device locates the operation object.
In some embodiments of the present application, the operation type of the operation object is used to indicate the operation type of the electronic device performed on the operation object, for example, clicking, double clicking, sliding, long pressing, and the like.
In some embodiments of the present application, the resolution of the operation object and the resolution of the screen are both used for the electronic device to obtain a more accurate operation position.
In some embodiments of the present application, when the electronic device receives the operation instruction, the screen resolution of the DCDL data generated is obtained to obtain a reference coordinate system, and when the coordinate system of the operation object corresponding to other DCDL data is different from the reference coordinate system, the coordinate position of the operation object can be adjusted according to a proportion, so that the electronic device can accurately acquire the coordinate position of the operation object, and further execute the corresponding operation instruction.
Illustratively, when the operation instruction is executed, the screen resolution of the electronic device is different from the original screen resolution recorded in the DCDL data, if the actual screen resolution and the original screen resolution are in an equal scaling relationship, a coordinate scaling factor is calculated, and the coordinate (bounds) value is dynamically converted into a current device coordinate system, wherein each coordinate of the bounds value is respectively the scaling factor. For example, the original screen resolution is 1080x2400, bounds value [100,100,200,200], and the actual resolution is 540x1200, then the scaling factor is 0.5, and the converted bounds value is [100 x 0.5,200 x 0.5 ]). If not, then the DCCL structure of the current interface needs to be parsed in real time and the actual bounds value acquired. Therefore, by applying the screen resolution (resolution) in the identification layer and matching with a dynamic coordinate conversion algorithm, the cross-equipment self-adaptive positioning of the absolute coordinates of the control is realized, and the problem of click offset caused by the screen size difference is solved.
For example, in the case that the operation instruction is to open the shopping application a, the electronic device determines, according to the semantic analysis result of the operation instruction, that the application icon of which the semantic description information is "shopping application a" is an operation object in the DCDL database, and further obtains the coordinates of the shopping application a in the DCDL data as [ x1, y1], at this time, the electronic device may determine the location of the application icon of the shopping application a based on this, and then learn, according to the obtained DCDL data, that the operation type of the application icon of the shopping application a is a click, so that the electronic device may automatically perform a click input on the location of the desktop [ x1, y1] to open the shopping application a, thereby executing the operation instruction of "open the shopping application a".
For example, the electronic device may obtain an outstanding operation instruction from the operation instruction sequence, and then obtain DCDL data of the target interface and the control in the operation instruction, and transfer the DCDL data to the operating system to automatically perform the corresponding operation.
For an interface, for example, the operating system of the electronic device may automatically open the interface according to the type of operation of the interface. For a control under an interface, the operating system of the electronic device needs to further locate the position of the control and trigger automatic execution of the system, namely further acquire DCDL data of the control, and then trigger the operating system to automatically execute an operation instruction based on the DCDL data.
For example, when the operation instructions are respectively positioning the search box, inputting the bluetooth headset, and triggering the search, firstly positioning the position of the search box according to the coordinate position information of the search box, then inputting the keyword 'bluetooth headset', and finally automatically clicking the search button to trigger the search.
The operation execution method includes the steps that an electronic device receives first information input by a user, the first information comprises at least one of voice information and text information, semantic analysis is conducted on the first information in response to the first information to obtain an operation instruction sequence corresponding to the first information, DCDL data of an operation object indicated by each operation instruction in the operation instruction sequence is obtained based on the operation instruction sequence, and operation instructions in the operation instruction sequence are executed sequentially based on the DCDL data. In the scheme, the user triggers the electronic device to directly perform semantic analysis on the continuous operation instruction by inputting text information or voice information containing the continuous operation instruction, so that DCDL data of an operation object indicated by each operation instruction in the continuous operation instruction is obtained, and the electronic device can automatically execute the corresponding operation instruction according to the DCDL data of the operation object indicated by each operation instruction in turn. Thus, the steps of man-machine interaction are saved by automatically executing the continuous operation instructions input by the user.
Alternatively, in some embodiments of the present application, the above step 203 may be implemented by the following step 203a or step 203 b.
In step 203a, when there is first DCDL data matching the semantic analysis result of the first operation instruction in the DCDL database, the electronic device uses the first DCDL data as DCDL data of the operation object indicated by the first operation instruction.
In step 203b, when there is no DCDL data matching the semantic analysis result of the first operation instruction in the DCDL database, the electronic device generates second DCDL data based on the semantic analysis result of the first operation instruction, and uses the second DCDL data as DCDL data of the operation object indicated by the first operation instruction.
In some embodiments of the present application, the semantic analysis result of the first operation instruction is a semantic analysis result obtained after the electronic device performs semantic analysis on the first operation instruction, and the specific description may refer to the description in step 202.
In some embodiments of the application, the first operation instruction is one operation instruction in the operation instruction sequence.
In one possible embodiment, when the electronic device retrieves that there is semantic description information matching the semantic analysis result of the first operation instruction in the DCDL database, the electronic device determines DCDL data indicated by the semantic description information, that is, the first DCDL data, as DCDL data of the operation object indicated by the operation instruction.
For example, for a static interface, such as an interface of a setup class, a personal center, etc., the electronic device may store interface information in the static interface, such as an interface layout, an interface name, and DCDL data of a control in the interface, in the DCDL database in advance.
For example, in the case where the operation instruction is to open the shopping application a, the electronic device needs to find DCDL data of the matched shopping application a, that is, an application package name (e.g., com. Example. Shapp), a screen resolution, and the like, according to a fuzzy match between a structuring intention, that is, a semantic description keyword (DCDL _keyword) in the operation instruction and a semantic description information (description) field in the DCDL database.
In another possible embodiment, the electronic device parses the current interface if no semantic description information matching the semantic analysis result of the first operation instruction is retrieved in the DCDL database, obtains an operation object matching the semantic analysis result in the current interface according to the semantic analysis result of the first operation instruction, then generates DCDL data in real time, and determines the DCDL data as DCDL data of the operation object indicated by the first operation instruction, that is, the second DCDL data.
For dynamic interfaces, such as a merchandise listing interface, for example, the electronic device may generate DCDL data in real-time through the DCDL parsing engine.
The logic of the DCDL parsing engine is to parse the current interface through the barrier-free service (AccessibilityService) of the system or the application specific built-in program, and then transmit the parsed current interface to the operating system to obtain the control tree of the current interface in real time, so as to generate DCDL data.
For example, in the case that the operation instruction is the positioning search box, since the electronic device has completed the previous operation instruction at this time, the electronic device may directly generate DCDL data of the search interface according to the data information of the search interface, where the DCDL data includes DCDL data of all controls included in the search interface, and at this time, the electronic device may match the DCDL data of the search box to the control search box in the search interface according to the semantic description information included in the DCDL data of the search interface through the semantic description keyword "search box".
It should be noted that, in one case, each operation instruction in the operation instruction sequence may find the corresponding DCDL data in the database, in another case, a part of operation instructions in the operation instruction sequence may find the corresponding DCDL data in the DCDL database, and another part of operation instructions may need to obtain the corresponding DCDL data from the DCDL data generated in real time. For example, the electronic device has executed an operation instruction for opening the shopping application a, and packages the corresponding DCDL data into a data packet, where the data packet includes interface DCDL data of all interfaces in the shopping application a, and DCDL data of all controls. At this time, the electronic device has opened the home page of the shopping application a and triggered a certain action, and after entering the next interface, the electronic device can directly acquire the interface ID of the current interface from the data packet, directly search the DCDL database for data according to the ID, directly acquire the corresponding DCDL data if the data exists, and generate the DCDL data in real time through the DCDL parsing engine if the data does not exist.
It should be noted that, the above procedure of acquiring DCDL data for each operation instruction in the operation instruction sequence may refer to the description procedure of the above steps.
Therefore, the electronic equipment not only reserves the precompiled data of the high-frequency operation interface, but also supports the dynamic generation of a temporary interface, such as control description of a commodity list page, by constructing a double-layer architecture of a pre-stored DCDL database and a real-time generation engine.
Optionally, in some embodiments of the present application, after the electronic device sequentially executes the operation instructions in the operation instruction sequence based on the DCDL data in step 204, the operation execution method provided in the embodiment of the present application further includes step 301.
Step 301, after each execution of one operation instruction in the operation instruction sequence, the electronic device displays an interface corresponding to the operation instruction.
In some embodiments of the present application, the interface corresponding to the operation instruction is an interface displayed by an execution result after the operation instruction is executed.
In some embodiments of the present application, the interfaces corresponding to each operation instruction may be the same or different.
In some embodiments of the present application, if the operation instruction is an operation instruction crossing an interface, the displayed interface is different from the interface corresponding to the previous operation instruction, and if the operation instruction is an operation instruction for executing an operation on the current interface, the displayed interface is not changed, but the layout of the interface may be changed.
For example, when the operation instruction is "open shopping application a", the interface displayed corresponding to the execution result after the operation instruction is executed is the main interface of shopping application a, or when the operation instruction is "locate search box", the interface displayed corresponding to the execution result after the operation instruction is executed is the search interface of shopping application a, or when the operation instruction is "input bluetooth headset", the interface displayed corresponding to the execution result after the operation instruction is executed is still the search interface of shopping application a, but the content in the search box is displayed as "bluetooth headset", or when the operation instruction is "trigger search", the interface displayed corresponding to the execution result after the operation instruction is displayed as the commodity list interface of bluetooth headset.
Therefore, the electronic equipment can clearly display the operation process and the operation result of the operation instruction to the user when executing one operation instruction, so that the user can conveniently and timely adjust the operation process according to the execution result.
Optionally, in some embodiments of the present application, after the step 204, the operation performing method provided by the embodiment of the present application further includes steps 401 to 403.
Step 401, after the electronic device executes the second operation instruction based on the DCDL data of the operation object indicated by the second operation instruction in the operation instruction sequence, verifying the execution result of the second operation instruction, and obtaining a verification result.
In some embodiments of the present application, the second operation instruction is one of the operation instruction sequences.
It should be noted that the first operation instruction and the second operation instruction may be the same operation instruction or may be different operation instructions, which is essentially one operation instruction in the operation instruction sequence.
In some embodiments of the present application, the verification result is used to indicate that the execution result of the operation instruction is correct or incorrect.
In some embodiments of the present application, the electronic device performs verification according to a verification condition corresponding to the second operation instruction, so as to obtain a verification result of the second operation instruction.
In some embodiments of the present application, after each execution of one of the operation instructions in the operation instruction sequence, an interface screenshot corresponding to the displayed one operation instruction is obtained.
In some embodiments of the present application, after the electronic device executes the second operation instruction, the interface screenshot corresponding to the second operation instruction is matched with the verification condition corresponding to the second operation instruction, so as to verify the second operation instruction.
In some embodiments of the present application, the electronic device determines that the verification result of the second operation instruction is correct when the image feature information of the interface screenshot corresponding to the second operation instruction matches the verification condition corresponding to the second operation instruction, and determines that the verification result of the second operation instruction is incorrect when the image feature information of the interface displayed after the execution of the second operation result does not match the verification condition corresponding to the second operation instruction.
In some embodiments of the present application, the electronic device obtains the screenshot of the interface by performing screenshot on the interface corresponding to the second operation instruction, further extracts image feature information of the screenshot by combining with an image matching algorithm, such as OpenCV template matching or OCR technology, and performs matching verification on the image feature information and verification conditions in DCDL data of the interface, if matching indicates that the execution reaches the expectation, it is determined that the verification result of the second operation instruction is correct, if not, it is determined that the verification result of the second operation instruction is correct.
If the verification condition is the operation object highlight mark, detecting whether the highlight mark exists in the screen capturing image of the current interface, if so, determining that the verification result of the second operation instruction is correct, and if not, determining that the verification result of the second operation instruction is incorrect.
If the verification condition is the interface jump identifier, detecting whether the screen capturing image of the current interface has the interface jump identifier, if so, determining that the verification result of the second operation instruction is correct, and if not, determining that the verification result of the second operation instruction is incorrect.
In some embodiments of the present application, the electronic device may further verify the execution result of the second operation instruction through a decision model. It will be appreciated that the validation process of the decision model may refer to the description of the validation process described above.
Step 402, if the verification result indicates that the execution result of the second operation instruction is correct, the electronic device changes the value of the completion status identifier carried by the second operation instruction to the first value.
In some embodiments of the present application, an operation instruction carries a completion status identifier.
The completion status identifier may be a boolean value, a binary value, or any other type of identifier.
In some embodiments of the present application, the completion status identifier is used to indicate an execution status of a corresponding operation instruction.
In some embodiments of the present application, the completion status identifier indicates different execution status of the corresponding operation instruction by different values.
The executing state comprises an executed state and a non-executed state.
In some embodiments of the present application, if the verification result indicates that the execution result of the second operation instruction is correct, the electronic device changes the value of the completion status identifier carried by the second operation instruction from the second value to the first value.
In some embodiments of the present application, the electronic device indicates that the corresponding operation instruction is in the executed state by the first value, and indicates that the corresponding operation instruction is in the unexecuted state by the second value.
The first value and the second value are different values.
For example, in the case where the completion status flag is 1, the corresponding operation instruction is indicated to be in the executed state. In the case where the completion status flag is 0, the corresponding operation instruction is indicated to be in the unexecuted state.
It should be noted that, when a certain operation instruction in the above operation instruction sequence is marked as an executed state, the electronic device may automatically transfer the associated DCDL data to a subsequent operation instruction, so that the subsequent operation instruction does not need to acquire some repeated DCDL data for a second time.
For example, if the completion status of the operation instruction "open the search interface" is identified as a first value indicating that the corresponding operation instruction is in an executed state, the DCDL data of the search interface of the operation instruction is transferred to the next operation instruction "locate the search box", and since the DCDL data of the search interface includes the DCDL data of all the controls in the search interface, the electronic device can directly obtain the DCDL data of the search box from the DCDL data of the search interface, without going to the DCDL database for query.
Optionally, in some embodiments of the present application, after the step 402, the operation performing method provided by the embodiment of the present application further includes a step 402a.
In step 402a, if the value of the completion status identifier carried by the third operation instruction is the second value, the electronic device executes the third operation instruction based on the DCDL data of the operation object indicated by the third operation instruction.
In some embodiments of the present application, the third operation instruction is a next operation instruction of the second operation instruction in the operation instruction sequence.
It should be noted that the first operation instruction, the second operation instruction, and the third operation instruction may be the same operation instruction, or may be different operation instructions, which are essentially one operation instruction in the operation instruction sequence.
In some embodiments of the present application, the second value indicates that the corresponding operation instruction is in an unexecuted state.
In some embodiments of the present application, after verifying that the second operation instruction ends, the electronic device detects an operation instruction in the first unexecuted state in the operation instruction sequence, that is, the third operation instruction, and continues to execute the operation.
Step 403, if the verification result indicates that the execution result of the second operation instruction is wrong, the electronic device re-acquires DCDL data of the operation object indicated by the second operation instruction.
In some embodiments of the present application, when the electronic device sequentially executes each operation instruction in the operation instruction sequence based on DCDL data of each operation object, each operation instruction is executed, an execution result of the operation instruction is determined, if the execution is correct, the next operation instruction is continuously executed, if the execution is incorrect, the DCDL data of the operation object corresponding to the operation instruction needs to be obtained again, and the operation instruction is executed according to the obtained DCDL data until the execution is correct.
It should be noted that, the above description uses the second operation instruction as an example, and describes an execution process of one operation instruction, where the execution process of each operation instruction in the operation instruction sequence may refer to the execution process of the second operation instruction.
Therefore, the electronic equipment performs chained transmission on the DCDL data through the completion of the state change, so that the step of inquiring the DCDL data is saved, and the efficiency of automatically executing the operation instruction is improved.
Alternatively, in some embodiments of the present application, the above step 401 may be specifically implemented by the following steps 401a and 401b, or the steps 401a and 401 c.
Step 401a, the electronic device matches the image feature information of the interface screenshot corresponding to the second operation instruction with the verification condition corresponding to the second operation instruction.
In some embodiments of the present application, the image feature information is used to characterize the display content of the interface image.
In some embodiments of the application, the electronic device may obtain the image feature information in the screenshot image via a model, or OCR technique.
In some embodiments of the present application, the image characteristic information may be identification information, layout information, or physical element information.
For example, where the screenshot image is a shopping cart interface image, the image feature information may be a shopping cart identification, shopping cart text, or the like.
Step 401b, when the image feature information is matched with the verification condition corresponding to the second operation instruction, the electronic device determines that the execution result of the second operation instruction is correct.
Step 401c, the electronic device determines that the execution result of the second operation instruction is wrong when the image feature information is not matched with the verification condition corresponding to the second operation instruction.
In some embodiments of the present application, the authentication condition is an authentication condition in DCDL data of the first interface.
In some embodiments of the application, the above-described verification conditions include, but are not limited to, an interface skip identifier (post_condition) and a visual signature (visual_ markers).
It should be noted that, the description of the verification condition may refer to the description content of the DCDL data, which is not repeated here.
In some embodiments of the present application, the electronic device performs matching verification on the recognition result and the verification condition in the interface DCDL data by performing screenshot on the result page and combining an image matching algorithm, such as OpenCV template matching or OCR technology, if matching, the execution is expected, the execution of the operation instruction is determined to be correct, and if not, the execution is not expected, the execution of the operation instruction is determined to be incorrect.
In this way, the result verification is performed according to the verification condition in the DCDL data of the operation instruction, so that the electronic equipment can ensure that the execution result of the operation instruction meets the expectations under the condition of automatically executing the operation instruction.
Optionally, in some embodiments of the present application, before the step 401, the method for performing an operation provided by the embodiment of the present application further includes step 501 and step 502.
Step 501, the electronic device adds the first identifier to the second operation instruction in the operation instruction sequence after executing the second operation instruction based on the DCDL data of the operation object indicated by the second operation instruction.
In some embodiments of the present application, the first identifier is used to indicate an operation instruction that is executed by the electronic device last time.
In some embodiments of the present application, the first identifier may be a boolean value, a binary value, or any other identifier.
In some embodiments of the present application, the first identifier is a different form of identifier of the completion status identifier.
In some embodiments of the present application, the first identifier is marked on an operation instruction every time the electronic device executes the operation instruction.
Step 502, the electronic device inputs the interface screenshot corresponding to the second operation instruction and the operation instruction sequence marked with the first identifier into the decision model.
In some embodiments of the present application, the decision model is used to verify the operation instruction marked with the first identifier in the operation instruction sequence.
In some embodiments of the present application, the decision model may verify whether the execution result of the operation instruction is executed correctly by detecting whether the verification condition of the current interface is satisfied.
In some embodiments of the present application, the above-mentioned operation instruction sequence marked with the first identifier may be understood as marking one of the operation instructions in the operation instruction sequence with the first identifier.
For example, the sequence of operation instructions input to the decision model may be that operation instruction 1 has completed, operation instruction 2 has completed, operation instruction 3 has not completed, and operation instruction 4 has not completed. Wherein, "completed" and "not completed" are all the completion status identifiers, and "wire" is the first identifier.
Alternatively, in some embodiments of the present application, the step 401a may be implemented by the following steps 401a1 and 401a2 in combination with the steps 501 and 502.
Step 401a1, the electronic device matches, through the decision model, the image feature information of the interface screenshot corresponding to the second operation instruction with the verification condition corresponding to the second operation instruction.
In some embodiments of the present application, after each execution of one of the operation instructions in the operation instruction sequence, an interface screenshot corresponding to the displayed one operation instruction is obtained.
In some embodiments of the present application, after the electronic device executes the second operation instruction, the interface screenshot corresponding to the second operation instruction and the operation instruction sequence marked with the first identifier are input to the decision model, the image feature information of the interface screenshot is extracted through the decision model, and the extracted image feature information is matched with the verification condition corresponding to the second operation instruction, so as to verify the second operation instruction.
In some embodiments of the present application, the electronic device determines that the verification result of the second operation instruction is correct when the image feature information of the interface screenshot corresponding to the second operation instruction is determined to be matched with the verification condition corresponding to the second operation instruction through the decision model, and determines that the verification result of the second operation instruction is incorrect when the image feature information of the interface displayed after the execution of the second operation instruction is determined to be not matched with the verification condition corresponding to the second operation instruction through the decision model.
It should be noted that, the specific verification process is the same as that of the step 401, and reference may be made to the description of the step 401, which is not repeated here.
Step 401a2, after verifying the execution result of the second operation instruction, the electronic device clears the first identifier and outputs the operation instruction sequence.
In some embodiments of the present application, the "clearing the first identifier" is used to prevent the operation instruction from colliding with other operation instructions marked with the first identifier when the operation instruction is verified by the decision model next time.
For example, the electronic device inputs the operation instruction sequence of the decision model that operation instruction 1 is completed, operation instruction 2 is completed, operation instruction 3 is not completed, and operation instruction 4 is not completed. The electronic equipment identifies an operation instruction 3 carrying a first identifier (wire) through a decision model, then the decision model further verifies whether the execution result of the operation instruction 3 is executed correctly, if the execution result of the operation instruction 3 is executed correctly, the first identifier (wire) is cleared, the completion state identifier of the operation instruction 3 is changed into the executed first value, and finally an operation instruction sequence is output, wherein the operation instruction sequence comprises that an operation instruction 1 is completed, an operation instruction 2 is completed, the operation instruction 3 is completed, and an operation instruction 4 is not completed. At this time, the electronic apparatus automatically executes the operation instruction 4.
Thus, the expected result after the operation is executed is defined through the verification condition (verification), including the interface jump identifier (post_condition) and the visual feature mark (visual_ markers), and the automatic verification of the operation result is realized through the matching of the image identification and the logic state, so that the operation flow is ensured to advance according to the expected. And the operation instruction which is executed recently is marked through the first mark, so that the accuracy of the operation instruction judged by the decision model can be effectively ensured.
The following describes in detail, in an embodiment, a detailed procedure of the operation execution method provided by the embodiment of the present application.
Example 1:
In this embodiment, take the example that the user inputs "i want to search for bluetooth headset in shopping application a" in the voice assistant of the mobile phone. As shown in fig. 2, the operation execution method provided by the present application includes the following steps A1 to A5.
Step A1, voice input analysis
And step A1-1, receiving voice signals, and acquiring a user voice instruction 'I want to search for a Bluetooth headset in the shopping application A' through a mobile phone microphone.
And step A1-2, extracting voice features, and carrying out acoustic feature coding by adopting a Mel frequency cepstrum coefficient algorithm.
Step A1-3, automatic speech recognition (Automatic Speech Recognition, ASR) conversion, calling a cloud speech recognition application program interface (Application Programming Interface, API), converting the speech into text information according to the speech characteristics, and the original text.
And A1-4, semantic cleaning, namely extracting key text information by using a large model, removing redundant words irrelevant to app operation or interface description and the like, such as words for language and gas, adjectives, adverbs and the like, and outputting standardized instruction texts. To further increase accuracy, a redundant word dictionary such as "hiccup", "that", "hello", "bother", etc. and a keyword dictionary such as extracting all the description fields in the DCCL database as keywords for providing a large model reference may be added.
Illustratively, the key text information "search bluetooth headset in shopping application a" is obtained by inputting the text information into the prompt word of the large model.
Step A2, intention structuring
The above intent structuring may be understood, for example, as decomposing the instruction text, performing the task, and converting it into a sequence of operating instructions.
Illustratively, the instruction text is subjected to task decomposition through a large language model, and a structural intention, namely the operation instruction sequence is output.
Illustratively, each of the above sequences of operation instructions may contain the following four core attributes:
1) And the operation instruction semantic description is that natural language is adopted to summarize the operation instruction, and an interpretable decision basis is provided for a large language model. Such as opening shopping application a.
2) Semantic description keywords (DCDL _keywords) are extracted through semantic analysis and are used for carrying out fuzzy matching with descriptions in a DCDL database, so that dynamic retrieval of a target interface and a control is realized. The mechanism breaks through the limitation of traditional fixed package name matching and supports intent generalization across application scenes. Such as shopping application a, merchandise, searches.
3) DCDL associated data, namely DCDL data corresponding to the operation object, including DCDL data of a target interface or control, such as package name, ACTIVITYID, control ID and the like, and establishing a direct mapping relation between an operation instruction and a physical interface element according to a description about a DCDL database in a complete and detailed field checking technical scheme.
4) Completion status identification (completed) marks the operation instruction execution status by boolean values for building the timing context of the operation flow.
Illustratively, the complete output operation instruction sequence herein is as follows, detailing the values and mechanisms of action of the different core properties in the respective operation instructions:
1) Opening shopping application A, i.e. the above-mentioned step semantic description
The DCDL _keywords field, namely the semantic description keywords "jindong, commodity, purchase and search", is obtained through an operation instruction, and DCDL data matching with the entry interface of the shopping application A is simulated in the DCDL database. The DCDL-associated data field of the current operation instruction contains the complete DCCL structure of the entry interface, such as metadata of the application package name, activity_id, control name, control coordinates, etc., and completed is initialized to an incomplete state, which indicates that the operation instruction has not yet been executed.
2) Positioning search boxes, i.e. semantic descriptions of the above steps
The DCDL related data field is the DCDL data of the related search box control, and comprises execution parameters such as coordinate positioning (bounds) and operation type (action) of the control. completed initialize to an incomplete state, wait for the system to update to a completed state after performing a click/focus operation.
3) Inputting key words, namely Bluetooth earphone, namely semantic description of the steps
A specific input value 'bluetooth headset' is input into the positioned search box control DCDL, completed is initialized to an unfinished state, and the search box control DCDL is updated to an executed state after waiting for input to be completed.
4) Triggering searches, i.e. semantic descriptions of the steps above
The DCDL-associated data field is DCDL data of the binding search button, and includes execution parameters such as click coordinates (bounds), alternate positioning policy (alternate_ selectors), and the like. The validation condition (validation field) presets the jump target page feature, completed flags need to be updated after a successful jump.
5. Validating results list, i.e. step semantic description
The DCDL related data field is the DCDL data of the related search result page, and the result verification is carried out through activity_id change detection and visual_ markers visual characteristics, such as a commodity list title. The operation instruction completed state will be the final completion flag for the entire operation flow.
Thus, the whole operation instruction sequence realizes operation parameter encapsulation through nested DCDL objects, dccl _keys support cross-version interface retrieval, completed field chains form a state machine model, and interface positioning-control operation-result verification is formed.
Step A3, automatically executing the operation instruction sequence
The mobile phone may obtain, according to the operation instruction sequence output in the step A2, an outstanding operation instruction from the first operation instruction sequence. After the mobile phone obtains the target interface, such as a search interface or a commodity list interface, and DCCL data of the control, which can be understood as the operation object, in the operation instruction, the obtained DCDL data is transferred to the operation system for automatic operation.
Illustratively, for an interface, the system automatically opens the interface. For a control under the interface, the system needs to further locate the control position (bounds) and trigger the automatic execution of the system. If the position of the search box is positioned, then the keyword 'Bluetooth earphone' is input, the position of the search button is positioned, and finally the search button is automatically clicked to trigger the search.
Step A4, intelligent circulation decision
For example, each time an operation instruction is executed, the operation instruction is marked in the operation instruction sequence, the marked operation instruction sequence is input into a decision model, whether the operation instruction is executed correctly is judged through the decision model, if the operation instruction is executed correctly, the completion state identification of the operation instruction is changed in the operation instruction sequence, the operation instruction sequence is output, and then the next operation instruction is executed. If the execution is wrong, the DCDL data associated with the operation instruction is changed in the operation instruction sequence, the operation instruction sequence is output, and the operation instruction is executed again based on the changed DCDL data until the execution is correct.
Step A5, verification of operation results
Illustratively, in the case that the completion status of the last operation instruction is changed to completion status, the recognition result and DCDL of the interface are matched and verified by screenshot the result page in combination with an image matching algorithm such as OpenCV template matching or OCR technology, if the matching is performed, the current operation instruction sequence is completed, if the matching is not consistent, the decision can be returned to step A4.
Thus, on one hand, the embodiment has the advantage of cross-system compatibility, that is, DCDL data and application development language are irrelevant, and different operating systems can be compatible, on the other hand, the embodiment has the advantage of dynamic adaptability, that is, a continuous updating mechanism of a DCDL database can cope with the requirement of automatically executing operation instructions of newly installed applications, on the other hand, the embodiment also has the advantage of dynamic decision capability, that is, the complex operation flow of a non-preset path is processed through the generation of DCDL data and the closed loop of model decision of a real-time interface. Thereby ensuring the efficiency and the expected result of the electronic equipment for automatically executing the multi-operation instruction.
The above embodiments of the method, or various possible implementation manners in the embodiments of the method, may be executed separately, or may be executed in any two or more combinations with each other, and may specifically be determined according to actual use requirements, which is not limited by the embodiments of the present application.
According to the operation execution method provided by the embodiment of the application, the execution main body can be the electronic equipment or the operation execution device. In the embodiment of the present application, an operation executing device executes an operation executing method by using an operation executing device as an example, and the operation executing device provided in the embodiment of the present application is described.
Fig. 3 shows a schematic diagram of one possible configuration of an operation performing device according to an embodiment of the present application. As shown in fig. 3, the operation performing device 700 may include a receiving module 701, a processing module 702, and an acquiring module 703.
The receiving module 701 is configured to receive first information input by a user, where the first information includes at least one of voice information or text information;
The processing module 702 is configured to perform semantic analysis on the first information in response to the first information received by the receiving module 701, to obtain an operation instruction sequence corresponding to the first information;
The acquiring module 703 is configured to acquire DCDL data of an operation object indicated by each operation instruction in the operation instruction sequence based on the operation instruction sequence;
The processing module 702 is further configured to sequentially execute the operation instructions in the operation instruction sequence based on the DCDL data.
Optionally, in some embodiments of the present application, the above-mentioned acquiring module 703 is specifically configured to:
When there is first DCDL data matching the semantic analysis result of the first operation instruction in the DCDL database, the first DCDL data is used as the DCDL data of the operation object indicated by the first operation instruction, or
Generating second DCDL data based on the semantic analysis result of the first operation instruction and taking the second DCDL data as DCDL data of an operation object indicated by the first operation instruction when the DCDL data matched with the semantic analysis result of the first operation instruction does not exist in the DCDL database;
Wherein the first operation instruction is one operation instruction in the operation instruction sequence.
Optionally, in some embodiments of the present application, as shown in fig. 3 and fig. 4, the apparatus 700 further includes a verification module 704, where the verification module 704 is configured to verify an execution result of the second operation instruction after executing the second operation instruction based on DCDL data of an operation object indicated by the second operation instruction in the operation instruction sequence after sequentially executing the operation instructions in the operation instruction sequence based on the DCDL data, so as to obtain a verification result;
The processing module 702 is further configured to change a value of the completion status identifier carried by the second operation instruction to a first value if the verification result indicates that the execution result of the second operation instruction is correct;
The processing module 702 is further configured to re-acquire DCDL data of the operation object indicated by the second operation instruction if the verification result indicates that the execution result of the second operation instruction is wrong;
wherein the second operation instruction is one of the operation instructions in the operation instruction sequence;
an operation instruction carries a completion status identifier, where the completion status identifier is used to indicate an execution status of a corresponding operation instruction, and the first value indicates that the corresponding operation instruction is in an executed status.
Optionally, in some embodiments of the present application, the processing module 702 is further configured to execute the third operation instruction based on DCDL data of the operation object indicated by the third operation instruction if a value of the completion status identifier carried by the third operation instruction is a second value;
the third operation instruction is the next operation instruction of the second operation instruction in the operation instruction sequence, and the second value indicates that the corresponding operation instruction is in an unexecuted state.
Optionally, in some embodiments of the present application, as shown in fig. 5 in conjunction with fig. 4, the apparatus further includes an intercepting module 705, where before the verifying module 704 verifies the execution result of the second operation instruction, the intercepting module is configured to obtain, after each execution of one operation instruction in the operation instruction sequence, a screenshot corresponding to the displayed one operation instruction, and obtain a screenshot corresponding to the one operation instruction;
The verification module 704 is specifically configured to:
matching the image characteristic information of the interface screenshot corresponding to the second operation instruction with the verification condition corresponding to the second operation instruction;
if the image feature information matches the verification condition corresponding to the second operation instruction, determining that the execution result of the second operation instruction is correct, or
And determining that the execution result of the second operation instruction is wrong when the image characteristic information is not matched with the verification condition corresponding to the second operation instruction.
The operation execution device provided by the embodiment of the application receives first information input by a user, wherein the first information comprises at least one of voice information and text information, performs semantic analysis on the first information in response to the first information to obtain an operation instruction sequence corresponding to the first information, acquires DCDL data of an operation object indicated by each operation instruction in the operation instruction sequence based on the operation instruction sequence, and sequentially executes the operation instructions in the operation instruction sequence based on the DCDL data. In the scheme, the user triggers the operation execution device to directly perform semantic analysis on the continuous operation instruction by inputting text information or voice information containing the continuous operation instruction, so that DCDL data of an operation object indicated by each operation instruction in the continuous operation instruction is obtained, and the operation execution device can automatically execute the corresponding operation instruction according to the DCDL data of the operation object indicated by each operation instruction in turn. Thus, the steps of man-machine interaction are saved by automatically executing the continuous operation instructions input by the user.
The operation executing device in the embodiment of the application can be an electronic device or a component in the electronic device, such as an integrated circuit or a chip. The electronic device may be a terminal, or may be other devices than a terminal. The electronic device may be a Mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted electronic device, a Mobile internet appliance (Mobile INTERNET DEVICE, MID), an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, a robot, a wearable device, an ultra-Mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), etc., and may also be a server, a network attached storage (Network Attached Storage, NAS), a personal computer (personal computer, PC), a Television (TV), a teller machine, a self-service machine, etc., which are not particularly limited in the embodiments of the present application.
The operation executing device in the embodiment of the present application may be a device having an operating system. The operating system may be an Android operating system, an iOS operating system, or other possible operating systems, and the embodiment of the present application is not limited specifically.
The operation execution device provided by the embodiment of the application can realize each process realized by the embodiment of the operation execution method, achieves the same technical effect, and is not repeated here for avoiding repetition.
Optionally, as shown in fig. 6, the embodiment of the present application further provides an electronic device 800, including a processor 801 and a memory 802, where the memory 802 stores a program or an instruction that can be executed on the processor 801, and the program or the instruction implements each step of the embodiment of the operation execution method when executed by the processor 801, and the steps achieve the same technical effects, so that repetition is avoided, and no further description is given here.
The electronic device in the embodiment of the application includes the mobile electronic device and the non-mobile electronic device.
Fig. 7 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
The electronic device 100 includes, but is not limited to, a radio frequency unit 101, a network module 102, an audio output unit 103, an input unit 104, a sensor 105, a display unit 106, a user input unit 107, an interface unit 108, a memory 109, and a processor 110.
Those skilled in the art will appreciate that the electronic device 100 may further include a power source (e.g., a battery) for powering the various components, and that the power source may be logically coupled to the processor 110 via a power management system to perform functions such as managing charging, discharging, and power consumption via the power management system. The electronic device structure shown in fig. 7 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than shown, or may combine certain components, or may be arranged in different components, which are not described in detail herein.
Wherein, the user input unit 107 is configured to receive first information input by a user, where the first information includes at least one of voice information or text information;
The processor 110 is configured to perform semantic analysis on the first information in response to the first information received by the user input unit 107, to obtain an operation instruction sequence corresponding to the first information;
the processor 110 is configured to obtain DCDL data of an operation object indicated by each operation instruction in the operation instruction sequence based on the operation instruction sequence;
the processor 110 is further configured to sequentially execute the operation instructions in the operation instruction sequence based on the DCDL data.
Optionally, in some embodiments of the present application, the processor 110 is specifically configured to:
When there is first DCDL data matching the semantic analysis result of the first operation instruction in the DCDL database, the first DCDL data is used as the DCDL data of the operation object indicated by the first operation instruction, or
Generating second DCDL data based on the semantic analysis result of the first operation instruction and taking the second DCDL data as DCDL data of an operation object indicated by the first operation instruction when the DCDL data matched with the semantic analysis result of the first operation instruction does not exist in the DCDL database;
Wherein the first operation instruction is one operation instruction in the operation instruction sequence.
Optionally, in some embodiments of the present application, the processor 110 is further configured to verify, after sequentially executing the operation instructions in the operation instruction sequence based on the DCDL data, an execution result of the second operation instruction after executing the second operation instruction based on the DCDL data of the operation object indicated by the second operation instruction in the operation instruction sequence, to obtain a verification result;
the processor 110 is further configured to change a value of a completion status identifier carried by the second operation instruction to a first value if the verification result indicates that the execution result of the second operation instruction is correct;
the processor 110 is further configured to, if the verification result indicates that the execution result of the second operation instruction is wrong, re-acquire DCDL data of the operation object indicated by the second operation instruction;
wherein the second operation instruction is one of the operation instructions in the operation instruction sequence;
an operation instruction carries a completion status identifier, where the completion status identifier is used to indicate an execution status of a corresponding operation instruction, and the first value indicates that the corresponding operation instruction is in an executed status.
Optionally, in some embodiments of the present application, the processor 110 is further configured to execute the third operation instruction based on DCDL data of the operation object indicated by the third operation instruction if a value of the completion status identifier carried by the third operation instruction is a second value;
the third operation instruction is the next operation instruction of the second operation instruction in the operation instruction sequence, and the second value indicates that the corresponding operation instruction is in an unexecuted state.
Optionally, in some embodiments of the present application, before the processor 110 verifies the execution result of the second operation instruction, after each execution of one operation instruction in the operation instruction sequence, the processor 110 is further configured to obtain an interface screenshot corresponding to the one operation instruction, where the interface screenshot corresponds to the one operation instruction;
The processor 110 is specifically configured to:
matching the image characteristic information of the interface screenshot corresponding to the second operation instruction with the verification condition corresponding to the second operation instruction;
if the image feature information matches the verification condition corresponding to the second operation instruction, determining that the execution result of the second operation instruction is correct, or
And determining that the execution result of the second operation instruction is wrong when the image characteristic information is not matched with the verification condition corresponding to the second operation instruction.
In the electronic equipment provided by the embodiment of the application, the electronic equipment receives first information input by a user, the first information comprises at least one of voice information or text information, semantic analysis is carried out on the first information in response to the first information to obtain an operation instruction sequence corresponding to the first information, DCDL data of an operation object indicated by each operation instruction in the operation instruction sequence is obtained based on the operation instruction sequence, and the operation instructions in the operation instruction sequence are sequentially executed based on the DCDL data. In the scheme, the user triggers the electronic device to directly perform semantic analysis on the continuous operation instruction by inputting text information or voice information containing the continuous operation instruction, so that DCDL data of an operation object indicated by each operation instruction in the continuous operation instruction is obtained, and the electronic device can automatically execute the corresponding operation instruction according to the DCDL data of the operation object indicated by each operation instruction in turn. Thus, the steps of man-machine interaction are saved by automatically executing the continuous operation instructions input by the user.
It should be appreciated that in embodiments of the present application, the input unit 104 may include a graphics processor (Graphics Processing Unit, GPU) 1041 and a microphone 1042, the graphics processor 1041 processing image data of still pictures or video obtained by an image capturing device (e.g. a camera) in a video capturing mode or an image capturing mode. The display unit 106 may include a display panel 1061, and the display panel 1061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 107 includes at least one of a touch panel 1071 and other input devices 1072. The touch panel 1071 is also referred to as a touch screen. The touch panel 1071 may include two parts of a touch detection device and a touch controller. Other input devices 1072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and so forth, which are not described in detail herein.
Memory 109 may be used to store software programs as well as various data. The memory 109 may mainly include a first memory area storing programs or instructions and a second memory area storing data, wherein the first memory area may store an operating system, application programs or instructions (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like. Further, the memory 109 may include volatile memory or nonvolatile memory, or the memory 109 may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM), static random access memory (STATIC RAM, SRAM), dynamic random access memory (DYNAMIC RAM, DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate Synchronous dynamic random access memory (Double DATA RATE SDRAM, DDRSDRAM), enhanced Synchronous dynamic random access memory (ENHANCED SDRAM, ESDRAM), synchronous link dynamic random access memory (SYNCH LINK DRAM, SLDRAM), and Direct random access memory (DRRAM). Memory 109 in embodiments of the present application includes, but is not limited to, these and any other suitable types of memory.
Processor 110 may include one or more processing units, and optionally, processor 110 integrates an application processor that primarily processes operations involving an operating system, user interface, application programs, etc., and a modem processor that primarily processes wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into the processor 110.
The embodiment of the application also provides a readable storage medium, on which a program or an instruction is stored, which when executed by a processor, implements each process of the above embodiment of the operation execution method, and can achieve the same technical effects, and in order to avoid repetition, the description is omitted here.
Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes computer readable storage medium such as computer readable memory ROM, random access memory RAM, magnetic or optical disk, etc.
The embodiment of the application further provides a chip, the chip comprises a processor and a communication interface, the communication interface is coupled with the processor, the processor is used for running programs or instructions, the processes of the embodiment of the operation execution method can be realized, the same technical effects can be achieved, and the repetition is avoided, and the description is omitted here.
It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.
Embodiments of the present application provide a computer program product stored in a storage medium, where the program product is executed by at least one processor to implement the processes of the embodiments of the operation execution method described above, and achieve the same technical effects, and for avoiding repetition, a detailed description is omitted herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a computer software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.